Overview

This notebook will contain the loading component of the Kimberley data loading procedure.

In preparation, the original data will have been

  • uploaded as-is to DPaW's internal CKAN data catalogue,
  • cleaned in OpenRefine (extract & tranform),
  • exported as CSV from OpenRefine, and
  • uploaded as additional resources to the CKAN dataset.

This workbook will parse the CSV versions and upload the data to BioSys via its API. Workhorse functions will be located in a separate file helpers.py.

Setup

Copy secret_template.py to secret.py and modify to contain your CKAN instance and API key.


In [1]:
import ckanapi
import csv
import json
import requests

from secret import CKAN, LCI, BIOSYS
import helpers as h

ck will be a ckanapi instance that carries your CKAN account's write permissions, and is able to read all public datasets.


In [2]:
ck = ckanapi.RemoteCKAN(CKAN["dpaw-internal"]["url"], apikey=CKAN["dpaw-internal"]["key"])

A CKAN resource's URL changes if the file resource changes, but the resource ID will be persistent.

The config dict LCI lists resource names (from original data worksheet names) against their CKAN resource ID.

A helper function get_data reads all configured datasets (CSV resources in CKAN).


In [10]:
data = h.get_data(ck, LCI)
data


Out[10]:
{'birds': <csv.DictReader instance at 0x7fac2c293638>,
 'birds_camera': <csv.DictReader instance at 0x7fac2c293050>,
 'bycatch': <csv.DictReader instance at 0x7fac2c293d88>,
 'dominant_vegetation': <csv.DictReader instance at 0x7fac2c1f8f80>,
 'ferals': <csv.DictReader instance at 0x7fac2c1f8878>,
 'lookups': <csv.DictReader instance at 0x7fac2c0f6ab8>,
 'mammals': <csv.DictReader instance at 0x7fac2c2936c8>,
 'observations': <csv.DictReader instance at 0x7fac2c0f6f38>,
 'sites': <csv.DictReader instance at 0x7fac2c0f6c68>,
 'stratum_summary': <csv.DictReader instance at 0x7fac2e5435f0>,
 'trapping_effort': <csv.DictReader instance at 0x7fac2c293ea8>,
 'vegetation': <csv.DictReader instance at 0x7fac2c099c20>}

In [11]:
[r for r in data["sites"]][0]


Out[11]:
{'Bearing (degree)': '330',
 'Collector': 'IR',
 'Comments': '',
 'Date established': '2011_07_24',
 'Date revisited': '',
 'Distance to closest water (m) 1': '1300',
 'Distance to closest water (m) 2': '',
 'Geology': 'Sandstone',
 'Geology Code': 'PkI',
 'Landform element (20m radius)': 'Levee',
 'Landform element code': 'LEV',
 'Landform pattern (300m radius)': 'Plateau',
 'Latitude': '-14.83883',
 'Location': 'Ranger Station',
 'Long term annual rainfall': '',
 'Longitude': '125.71429',
 'Photos taken': 'yes',
 'Site No': 'LCI 001',
 'Soil colour 1': 'yellow',
 'Soil colour 2': 'grey',
 'Soil colour 3': 'pale',
 'Soil surface texture': 'sandy loam',
 'Soil surface texture group': 'SL',
 'Survey': 'LCI',
 'Tenure': 'MRNP',
 'Type of closest water 1': 'riverine permanent',
 'Type of closest water 2': '',
 'Underlaying geology ': 'King Leopold Sandstone',
 'Veg': 'Woodland',
 'Year': '2011'}

In [ ]: